智能论文笔记

Bayesian model calibration for block copolymer self-assembly: Likelihood-free inference and expected information gain computation via measure transport

Ricardo Baptista , Lianghao Cao , Joshua Chen , Omar Ghattas , Fengyi Li , Youssef M. Marzouk , J. Tinsley Oden

分类： (统计)机器学习

2022-06-22

我们考虑了使用显微镜或X射线散射技术产生的图像数据自组装的模型的贝叶斯校准。为了说明BCP平衡结构中的随机远程疾病，我们引入了辅助变量以表示这种不确定性。然而，这些变量导致了高维图像数据的综合可能性，通常可以评估。我们使用基于测量运输的可能性方法以及图像数据的摘要统计数据来解决这一具有挑战性的贝叶斯推理问题。我们还表明，可以计算出有关模型参数的数据中的预期信息收益（EIG），而无需额外的成本。最后，我们介绍了基于二嵌段共聚物薄膜自组装和自上而下显微镜表征的ohta-kawasaki模型的数值案例研究。为了进行校准，我们介绍了一些基于域的能量和傅立叶的摘要统计数据，并使用EIG量化了它们的信息性。我们证明了拟议方法研究数据损坏和实验设计对校准结果的影响的力量。

translated by 谷歌翻译

On the representation and learning of monotone triangular transport maps

Ricardo Baptista , Youssef Marzouk , Olivier Zahm

分类： (统计)机器学习 | 机器学习

2020-09-22

度量的运输提供了一种用于建模复杂概率分布的多功能方法，并具有密度估计，贝叶斯推理，生成建模及其他方法的应用。单调三角传输地图$ \ unicode {x2014} $近似值$ \ unicode {x2013} $ rosenblatt（kr）重新安排$ \ unicode {x2014} $是这些任务的规范选择。然而，此类地图的表示和参数化对它们的一般性和表现力以及对从数据学习地图学习（例如，通过最大似然估计）出现的优化问题的属性产生了重大影响。我们提出了一个通用框架，用于通过平滑函数的可逆变换来表示单调三角图。我们建立了有关转化的条件，以使相关的无限维度最小化问题没有伪造的局部最小值，即所有局部最小值都是全球最小值。我们展示了满足某些尾巴条件的目标分布，唯一的全局最小化器与KR地图相对应。鉴于来自目标的样品，我们提出了一种自适应算法，该算法估计了基础KR映射的稀疏半参数近似。我们证明了如何将该框架应用于关节和条件密度估计，无可能的推断以及有向图形模型的结构学习，并在一系列样本量之间具有稳定的概括性能。

translated by 谷歌翻译

Assessment of creditworthiness models privacy-preserving training with synthetic data

Ricardo Muñoz-Cancino , Cristián Bravo , Sebastián A. Ríos , Manuel Graña

分类：机器学习

2022-12-31

Credit scoring models are the primary instrument used by financial institutions to manage credit risk. The scarcity of research on behavioral scoring is due to the difficult data access. Financial institutions have to maintain the privacy and security of borrowers' information refrain them from collaborating in research initiatives. In this work, we present a methodology that allows us to evaluate the performance of models trained with synthetic data when they are applied to real-world data. Our results show that synthetic data quality is increasingly poor when the number of attributes increases. However, creditworthiness assessment models trained with synthetic data show a reduction of 3\% of AUC and 6\% of KS when compared with models trained with real data. These results have a significant impact since they encourage credit risk investigation from synthetic data, making it possible to maintain borrowers' privacy and to address problems that until now have been hampered by the availability of information.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Transformer-based normative modelling for anomaly detection of early schizophrenia

Pedro F Da Costa , Jessica Dafflon , Sergio Leonardo Mendes , João Ricardo Sato , M. Jorge Cardoso , Robert Leech , Emily JH Jones , Walter H. L. Pinaya

分类：机器学习 | 人工智能

2022-12-08

Despite the impact of psychiatric disorders on clinical health, early-stage diagnosis remains a challenge. Machine learning studies have shown that classifiers tend to be overly narrow in the diagnosis prediction task. The overlap between conditions leads to high heterogeneity among participants that is not adequately captured by classification models. To address this issue, normative approaches have surged as an alternative method. By using a generative model to learn the distribution of healthy brain data patterns, we can identify the presence of pathologies as deviations or outliers from the distribution learned by the model. In particular, deep generative models showed great results as normative models to identify neurological lesions in the brain. However, unlike most neurological lesions, psychiatric disorders present subtle changes widespread in several brain regions, making these alterations challenging to identify. In this work, we evaluate the performance of transformer-based normative models to detect subtle brain changes expressed in adolescents and young adults. We trained our model on 3D MRI scans of neurotypical individuals (N=1,765). Then, we obtained the likelihood of neurotypical controls and psychiatric patients with early-stage schizophrenia from an independent dataset (N=93) from the Human Connectome Project. Using the predicted likelihood of the scans as a proxy for a normative score, we obtained an AUROC of 0.82 when assessing the difference between controls and individuals with early-stage schizophrenia. Our approach surpassed recent normative methods based on brain age and Gaussian Process, showing the promising use of deep generative models to help in individualised analyses.

translated by 谷歌翻译

D2DF2WOD: Learning Object Proposals for Weakly-Supervised Object Detection via Progressive Domain Adaptation

Yuting Wang , Ricardo Guerrero , Vladimir Pavlovic

分类：计算机视觉

2022-12-02

Weakly-supervised object detection (WSOD) models attempt to leverage image-level annotations in lieu of accurate but costly-to-obtain object localization labels. This oftentimes leads to substandard object detection and localization at inference time. To tackle this issue, we propose D2DF2WOD, a Dual-Domain Fully-to-Weakly Supervised Object Detection framework that leverages synthetic data, annotated with precise object localization, to supplement a natural image target domain, where only image-level labels are available. In its warm-up domain adaptation stage, the model learns a fully-supervised object detector (FSOD) to improve the precision of the object proposals in the target domain, and at the same time learns target-domain-specific and detection-aware proposal features. In its main WSOD stage, a WSOD model is specifically tuned to the target domain. The feature extractor and the object proposal generator of the WSOD model are built upon the fine-tuned FSOD model. We test D2DF2WOD on five dual-domain image benchmarks. The results show that our method results in consistently improved object detection and localization compared with state-of-the-art methods.

translated by 谷歌翻译

Embedding generation for text classification of Brazilian Portuguese user reviews: from bag-of-words to transformers

Frederico Dias Souza , João Baptista de Oliveira e Souza Filho

分类：自然语言处理 | 人工智能

2022-12-01

Text classification is a natural language processing (NLP) task relevant to many commercial applications, like e-commerce and customer service. Naturally, classifying such excerpts accurately often represents a challenge, due to intrinsic language aspects, like irony and nuance. To accomplish this task, one must provide a robust numerical representation for documents, a process known as embedding. Embedding represents a key NLP field nowadays, having faced a significant advance in the last decade, especially after the introduction of the word-to-vector concept and the popularization of Deep Learning models for solving NLP tasks, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformer-based Language Models (TLMs). Despite the impressive achievements in this field, the literature coverage regarding generating embeddings for Brazilian Portuguese texts is scarce, especially when considering commercial user reviews. Therefore, this work aims to provide a comprehensive experimental study of embedding approaches targeting a binary sentiment classification of user reviews in Brazilian Portuguese. This study includes from classical (Bag-of-Words) to state-of-the-art (Transformer-based) NLP models. The methods are evaluated with five open-source databases with pre-defined data partitions made available in an open digital repository to encourage reproducibility. The Fine-tuned TLMs achieved the best results for all cases, being followed by the Feature-based TLM, LSTM, and CNN, with alternate ranks, depending on the database under analysis.

translated by 谷歌翻译

Emerging trends in machine learning for computational fluid dynamics

Ricardo Vinuesa , Steve Brunton

分类：机器学习

2022-11-28

The renewed interest from the scientific community in machine learning (ML) is opening many new areas of research. Here we focus on how novel trends in ML are providing opportunities to improve the field of computational fluid dynamics (CFD). In particular, we discuss synergies between ML and CFD that have already shown benefits, and we also assess areas that are under development and may produce important benefits in the coming years. We believe that it is also important to emphasize a balanced perspective of cautious optimism for these emerging approaches

translated by 谷歌翻译

Direct deduction of chemical class from NMR spectra

Stefan Kuhn , Carlos Cobas , Agustin Barba , Simon Colreavy-Donnelly , Fabio Caraffini , Ricardo Moreira Borges

分类：人工智能 | 机器学习

2022-11-06

This paper presents a proof-of-concept method for classifying chemical compounds directly from NMR data without doing structure elucidation. This can help to reduce time in finding good structure candidates, as in most cases matching must be done by a human engineer, or at the very least a process for matching must be meaningfully interpreted by one. Therefore, for a long time automation in the area of NMR has been actively sought. The method identified as suitable for the classification is a convolutional neural network (CNN). Other methods, including clustering and image registration, have not been found suitable for the task in a comparative analysis. The result shows that deep learning can offer solutions to automation problems in cheminformatics.

translated by 谷歌翻译

Collaborative Anomaly Detection

Ke Bai , Aonan Zhang , Zhizhong Li , Ricardo Heano , Chong Wang , Lawrence Carin

分类：机器学习

2022-09-20

在推荐系统中，项目可能会接触到各种用户，我们想了解新用户对现有项目的熟悉。这可以作为异常检测（AD）问题进行配置，该问题区分“普通用户”（名义）和“新用户”（异常）。考虑到物品的庞大数量和用户项目配对数据的稀疏性，在每个项目上独立应用传统的单任务检测方法很快就变得困难，而项目之间的相关性则被忽略。为了解决这个多任务异常检测问题，我们建议协作异常检测（CAD）共同学习所有任务，并通过任务之间的嵌入编码相关性来学习所有任务。我们通过条件密度估计和条件可能性比估计来探索CAD。我们发现：$ i $）估计似然比的学习效率更高，并且比密度估计更好。 $ ii $）提前选择少量任务以学习任务嵌入模型，然后使用它来启动所有任务嵌入是有益的。因此，这些嵌入可以捕获任务之间的相关性并推广到新的相关任务。

translated by 谷歌翻译